Search CORE

148 research outputs found

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

Author: Crankshaw Daniel
Dave Ankur
Franklin Michael J.
Gonzalez Joseph E.
Stoica Ion
Xin Reynold S.
Publication venue
Publication date: 11/02/2014
Field of study

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

arXiv.org e-Print Archive

CiteSeerX

[Demo] Low-latency spark queries on updatable data

Author: Boncz P.A. (Peter)
Dave A. (Ankur)
Ghit B. (Bogdan)
Uta A. (Alexandru)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/06/2019
Field of study

As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing

VU Research Portal

Crossref

CWI's Institutional Repository

Prevalence of childhood asthma and its immediate outcome - At tertiary care rural hospital

Author: Dave Hemal
Dosi Jigar
Mehta Ankur
Muley Prasad
Patel Maitrey
Shah Unnati
Publication venue: Atharva Scientific Publications
Publication date
Field of study

Introduction: Asthma is a chronic inflammatory disorder of the airways resulting in episodic airway obstruction. Globally, childhood asthma is increasing in the prevalence, despite improvements in investigation and treatment. Childhood asthma seemed more prevalent in urban population and now even in rural areas of India. Objectives: To know the prevalence, assess the risk factors, severity, and immediate outcome of the treatment offered to asthmatic children in a tertiary rural hospital. Materials and Methods: All the diagnosed asthmatic children up to 18 years were enrolled in the study. All the patients of pulmonary Koch’s, congenital heart disease and chronic lung disease were excluded from the study. Clinical profile was noted in recruited patients. Results: The prevalence of childhood asthma among children visiting to our department was 3.93%. 58 (48.33%) had age of onset before the age of 6 years. Asthma was more prevalent in boys. 116 (96.66%) children presented with complain of cough, and 118 (98.33%) children had associated breathlessness. Common precipitating factors were change in season (71.66%), pollen allergy (58.33%), air pollutieon (45.00%), and passive smoking (23.33%). Exercise-induced asthma was seen in 55% cases, diurnal variation in 60% and 28.33% children had family history of atopic disease. Majority of the patient was undernourished. The average duration of stay in persistent asthma is 1.8 times more than in intermittent asthma. Conclusion: Significant number of patient becomes symptomatic before the 6 years of age. Prevention of child from exposure to passive smoking, environmental improvement, and allergen avoidance are major aspects for prevention of asthma exacerbations

Atharva Scientific Publications (E-Jounals)

In-Memory Indexed Caching for Distributed Data Processing

Author: Boncz P.A. (Peter)
Dave A. (Ankur)
Ghit B. (Bogdan)
Rellermeyer J. (Jan)
Uta A. (Alexandru)
Publication venue
Publication date: 12/12/2021
Field of study

Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. The de-facto distributed data processing framework, Apache Spark, is poorly suited for the modern cloud-based data-science workloads due to its outdated assumptions: static datasets analyzed using coarse-grained transformations. In this paper, we introduce the Indexed DataFrame, an in-memory cache that supports a dataframe abstraction which incorporates indexing capabilities to support fast lookup and join operations. Moreover, it supports appends with multi-version concurrency control. We implement the Indexed DataFrame as a lightweight, standalone library which can be integrated with minimum effort in existing Spark programs. We analyze the performance of the Indexed DataFrame in cluster and cloud deployments with real-world datasets and benchmarks using both Apache Spark and Databricks Runtime. In our evaluation, we show that the Indexed DataFrame significantly speeds-up query execution when compared to a non-indexed dataframe, incurring modest memory overhead

CWI's Institutional Repository

Representativeness of Eddy-Covariance flux footprints for areas surrounding AmeriFlux sites

Author: Arain M. Altaf
Arkebauer Tim J.
Baldocchi Dennis
Bernacchi Carl
Billesbach Dave
Biraud Sébastien C.
Black T. Andrew
Blanken Peter D.
Bohrer Gil
Bracho Rosvel
Brown Shannon
Brunsell Nathaniel A.
Chan W. Stephen
Chen Jiquan
Chen Xingyuan
Chu Housen
Clark Kenneth
Dengel Sigrid
Desai Ankur R.
Duman Tomer
Durden David
Fares Silvano
Forbrich Inke
Gamon John A.
Gough Christopher M.
Griffis Timothy
Helbig Manuel
Hollinger David
Humphreys Elyn
Ikawa Hiroki
Iwata Hiroki
Ju Yang
Knowles John F.
Knox Sara H.
Kobayashi Hideki
Kumar Jitendra
Luo Xiangzhong
Metzger Stefan
Ouyang Zutao
Torn Margaret S.
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 15/05/2021
Field of study

Large datasets of greenhouse gas and energy surface-atmosphere fluxes measured with the eddy-covariance technique (e.g., FLUXNET2015, AmeriFlux BASE) are widely used to benchmark models and remote-sensing products. This study addresses one of the major challenges facing model-data integration: To what spatial extent do flux measurements taken at individual eddy-covariance sites reflect model- or satellite-based grid cells? We evaluate flux footprints—the temporally dynamic source areas that contribute to measured fluxes—and the representativeness of these footprints for target areas (e.g., within 250–3000 m radii around flux towers) that are often used in flux-data synthesis and modeling studies. We examine the land-cover composition and vegetation characteristics, represented here by the Enhanced Vegetation Index (EVI), in the flux footprints and target areas across 214 AmeriFlux sites, and evaluate potential biases as a consequence of the footprint-to-target-area mismatch. Monthly 80% footprint climatologies vary across sites and through time ranging four orders of magnitude from 103 to 107 m2 due to the measurement heights, underlying vegetation- and ground-surface characteristics, wind directions, and turbulent state of the atmosphere. Few eddy-covariance sites are located in a truly homogeneous landscape. Thus, the common model-data integration approaches that use a fixed-extent target area across sites introduce biases on the order of 4%–20% for EVI and 6%–20% for the dominant land cover percentage. These biases are site-specific functions of measurement heights, target area extents, and land-surface characteristics. We advocate that flux datasets need to be used with footprint awareness, especially in research and applications that benchmark against models and data products with explicit spatial information. We propose a simple representativeness index based on our evaluations that can be used as a guide to identify site-periods suitable for specific applications and to provide general guidance for data use

DigitalCommons@University of Nebraska

ECOSTRESS: NASA's next generation mission to measure evapotranspiration from the International Space Station

Author: Anderson Martha
Anderson Ray G.
Aragon Bruno
Arain M. Altaf
Baker John M.
Baldocchi Dennis D.
Barral Hélène
Bernacchi Carl J.
Biraud Sébastien C.
Bohrer Gil
Brunsell Nathaniel
Cappelaere Bernard
Castro‐Contreras Saulo
Cawse‐Nicholson Kerry
Christian Bernhofer
Chun Junghwa
Conrad Bryan J.
Cremonese Edoardo
De Ligne Anne
Demarty Jérôme
Desai Ankur R.
Dohlen Matthew B.
Fisher Joshua B.
Foltýnová Lenka
French Andrew
Goulden Michael L.
Griffis Timothy J.
Grünwald Thomas
Hain Christopher
Halverson Gregory H.
Hook Simon
Hulley Glynn
Johnson Mark S.
Kang Minseok
Kelbe Dave
Kowalska Natalia
Lee Brian
Lim Jong‐Hwan
Maïnassara Ibrahim
McCabe Matthew F.
Missik Justine E.C.
Mohanty Binayak P.
Moore Caitlin E.
Morillas Laura
Morrison Ross
Munger J. William
Posse Gabriela
Purdy Adam J.
Richardson Andrew D.
Russell Eric S.
Ryu Youngryel
Sanchez‐Azofeifa Arturo
Schmidt Marius
Schwartz Efrat
Sharp Iain
Tang Yao
Wang Audrey
Wood Eric
Šigut Ladislav
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 01/01/2020
Field of study

The ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station ECOSTRESS) was launched to the International Space Station on June 29, 2018. The primary science focus of ECOSTRESS is centered on evapotranspiration (ET), which is produced as level‐3 (L3) latent heat flux (LE) data products. These data are generated from the level‐2 land surface temperature and emissivity product (L2_LSTE), in conjunction with ancillary surface and atmospheric data. Here, we provide the first validation (Stage 1, preliminary) of the global ECOSTRESS clear‐sky ET product (L3_ET_PT‐JPL, version 6.0) against LE measurements at 82 eddy covariance sites around the world. Overall, the ECOSTRESS ET product performs well against the site measurements (clear‐sky instantaneous/time of overpass: r2 = 0.88; overall bias = 8%; normalized RMSE = 6%). ET uncertainty was generally consistent across climate zones, biome types, and times of day (ECOSTRESS samples the diurnal cycle), though temperate sites are over‐represented. The 70 m high spatial resolution of ECOSTRESS improved correlations by 85%, and RMSE by 62%, relative to 1 km pixels. This paper serves as a reference for the ECOSTRESS L3 ET accuracy and Stage 1 validation status for subsequent science that follows using these data

HAL-INSU

HAL-IRD

Juelich Shared Electronic Resources

Hal-Diderot

Repositorio Institucional – Biblioteca Digital

NERC Open Research Archive

In-situ estimation of ice crystal properties at the South Pole using LED calibration data from the IceCube Neutrino Observatory

Author: Abbasi Rasha
Ackermann Markus
Adams Jenni
Aggarwal Nakul
Aguilar Juanan
Ahlers Markus
Ahrens Maryon
Alameddine Jean-Marco
Alves Junior Antonio Augusto
Amin Najia Moureen Binte
Andeen Karen
Anderson Tyler
Anton Gisela
Argüelles Carlos
Ashida Yosuke
Athanasiadou Sofia
Axani Spencer
Bai Xinhua
Balagopal V Aswathi
Baricevic Moreno
Barwick Steve
Basu Vedant
Bay Ryan
Beatty James
Becker Tjus Julia
Becker Karl Heinz
Beise Jakob
Bellenghi Chiara
Benda Samuel
BenZvi Segev
Berley David
Bernardini Elisa
Besson Dave
Binder Gary
Bindig Daniel
Blaufuss Erik
Blot Summer
Bontempo Federico
Book Julia
Borowka Jürgen
Boscolo Meneguolo Caterina
Botner Olga
Bourbeau Etienne
Braun Jim
Brinson Bennett
Brostean-Kaiser Jannes
Burley Ryan
Busse Raffaela
Böser Sebastian
Böttcher Jakob
Campana Michael
Carnie-Bronca Erin
Chen Chujie
Chen Zheyang
Chirkin Dmitry
Choi Koun
Clark Brian
Classen Lew
Coleman Alan
Collin Gabriel
Connolly Amy
Conrad Janet
Coppin Paul
Correa Pablo
Countryman Stefan
Cowen Doug
Cross Robert
Dappen Christian
Dave Pranav
De Clercq Catherine
de Vries Krijn
de Wasseige Gwenhael
DeLaunay James
Delgado López Diyaselis
Dembinski Hans
Deoskar Kunal
Desai Abhishek
Desiati Paolo
DeYoung Tyce
Diaz Alejandro
Dittmer Markus
Dujmovic Hrvoje
DuVernois Michael
Díaz-Vélez Juan Carlos
Ehrhardt Thomas
Eller Philipp
Engel Ralph
Erpenbeck Hannah
Evans John
Evenson Paul
Fan Kwok Lung
Fazely Ali
Fedynitch Anatoli
Feigl Nora
Fiedlschuster Sebastian
Fienberg Aaron
Finley Chad
Fischer Leander
Fox Derek
Franckowiak Anna
Friedman Elizabeth
Fritz Alexander
Fürst Philipp
Gaisser Tom
Gallagher Jay
Ganster Erik
Garcia Alfonso
Garrappa Simone
Gerhardt Lisa
Ghadimi Ava
Glaser Christian
Glauch Theo
Glüsenkamp Thorsten
Goehlke Noah
Gonzalez Javier
Goswami Sreetama
Grant Darren
Gray Shannon
Griswold Spencer
Grégoire Timothée
Gutjahr Pascal
Günther Christoph
Ha Minh Martin
Haack Christian
Hallgren Allan
Halliday Robert
Halve Lasse
Halzen Francis
Hamdaoui Hassane
Hanson Kael
Hardin John
Harnisch Alexander
Hatch Patrick
Haungs Andreas
Helbing Klaus
Hellrung Jonas
Henningsen Felix
Heuermann Lars
Hickford Stephanie
Hill Colton
Hill Gary
Hoffman Kara
Hoshina Kotoyo
Hou Wenjie
Huber Thomas
Hultqvist Klas
Hussain Raamis
Hymon Karolin
Hünnefeld Mirco
IceCube Collaboration
In Seongjin
Iovine Nadege
Ishihara Aya
Jansson Matti
Japaridze George
Jeong Minjin
Jin Miaochen
Jones Ben
Kang Donghwa
Kang Woosik
Kang Xinyue
Kappes Alexander
Kappesser David
Kardum Leonora
Karg Timo
Karl Martina
Karle Albrecht
Katz Uli
Kauer Matt
Kelley John
Kheirandish Ali
Kin Ken'ichi
Kiryluk Joanna
Klein Spencer
Kochocki Alina
Koirala Ramesh
Kolanoski Hermann
Kontrimas Tomas
Kopper Claudio
Koskinen Jason
Koundal Paras
Kovacevich Michael
Kowalski Marek
Kozynets Tetiana
Krupczak Emmett
Kun Emma
Kurahashi Naoko
Köpke Lutz
Lad Neha
Lagunas Gualda Cristina
Larson Michael
Lauber Frederik
Lazar Jeffrey
Lee Jiwoong
Leonard Kayla
Leszczyńska Agnieszka
Lincetto Massimiliano
Liu Qinrui
Liubarska Maria
Lohfink Elisa
Love Christina
Lozano Mariscal Cristian Jesus
Lu Lu
Lucarelli Francesco
Ludwig Andrew
Luszczak William
Lyu Yang
Ma Wing Yan
Madsen Jim
Mahn Kendall
Makino Yuya
Mancina Sarah
Marie Sainte Wenceslas
Mariş Ioana
Marka Szabolcs
Marka Zsuzsa
Marsee Matthew
Martinez-Soler Ivan
Maruyama Reina
McElroy Thomas
McNally Frank
Mead James Vincent
Meagher Kevin
Mechbal Sarah
Medina Andres
Meier Maximilian
Meighen-Berger Stephan
Merckx Yarno
Micallef Jessie
Mockler Daniela
Montaruli Teresa
Moore Roger
Morse Bob
Moulai Marjon
Mukherjee Tista
Naab Richard
Nagai Ryo
Naumann Uwe
Nayerhoda Amid
Necker Jannis
Neumann Miriam
Niederhausen Hans
Nisa Mehr
Nowicki Sarah
O'Sullivan Erin
Obertacke Pollmann Anna
Oehler Marie
Oeyen Bob
Olivas Alex
Orsoe Rasmus
Osborn Jesse
Pandya Hershal
Pankova Daria
Park Nahee
Parker Grant
Paudel Ek Narayan
Paul Larissa
Peters Lilly
Peterson Josh
Philippen Saskia
Pieper Sarah
Pizzuto Alex
Plum Matthias
Popovych Yuiry
Porcelli Alessio
Prado Rodriguez Maria
Pries Brandon
Procter-Murphy Rachel
Przybylski Gerald
Pérez de los Heros Carlos
Raab Christoph
Rack-Helleis John
Rameez Mohamed
Rawlins Katherine
Rechav Zoe
Rehman Abdul
Reichherzer Patrick
Renzi Giovanni
Resconi Elisa
Reusch Simeon
Rhode Wolfgang
Richman Mike
Riedel Benedikt
Roberts Ella
Robertson Sally
Rodan Steven
Roellinghoff Gerrit
Rongen Martin
Rott Carsten
Ruhe Tim
Ruohan Li
Ryckbosch Dirk
Rysewyk Cantu Devyn
Safa Ibrahim
Saffer Julian
Salazar-Gallegos Daniel
Sampathkumar Pranav
Sanchez Herrera Sebastian
Sandrock Alexander
Santander Marcos
Sarkar Sourav
Sarkar Subir
Schaufel Merlin
Schieler Harald
Schindler Sebastian
Schlüter Berit
Schmidt Torsten
Schneider Judith
Schröder Frank
Schumacher Lisa
Schwefer Georg
Sclafani Steve
Seckel Dave
Seunarine Surujhdeo
Sharma Ankur
Shefali Shefali
Shimizu Nobuhiro
Silva Manuel
Skrzypek Barbara
Smithers Ben
Snihur Robert
Soedingrekso Jan
Soldin Dennis
Spannfellner Christian
Spiczak Glenn
Spiering Christian
Stamatikos Michael
Stanev Todor
Stein Robert
Stezelberger Thorsten
Stuttard Thomas
Stürwald Timo
Sullivan Greg
Søgaard Andreas
Taboada Ignacio
Ter-Antonyan Samvel
Thompson Will
Thwaites Jessie
Tilav Serap
Tollefson Kirsten
Toscano Simona
Tosi Delia
Trettin Alexander
Tung Chun Fai
Turcotte Roxanne
Twagirayezu Jean Pierre
Ty Bunheng
Tönnis Christoph
Unland Elorrieta Martin
Upshaw Karriem
Valtonen-Mattila Nora
van Eijndhoven Nick
van Santen Jakob
Vandenbroucke Justin
Vannerom David
Vara Javi
Veitch-Michaelis Joshua
Verpoest Stef
Veske Doga
Walck Christian
Wang Winnie
Watson Timothy Blake
Weaver Chris
Weigel Philip
Weindl Andreas
Weldert Jan
Wendt Chris
Werthebach Johannes
Weyrauch Mark
Whitehorn Nathan
Wiebusch Christopher
Willey Nathan
Williams Dawn
Wolf Martin
Wrede Gerrit
Wulff Johan
Xu Xianwu
Yanez Juan Pablo
Yildizci Emre
Yoshida Shigeru
Yu Shiqi
Yuan Tianlu
Zhang Zelong
Zhelnin Pavel
Publication venue: 'Copernicus GmbH'
Publication date: 01/01/2022
Field of study

The IceCube Neutrino Observatory instruments about 1 km3 of deep, glacial ice at the geographic South Pole using 5160 photomultipliers to detect Cherenkov light emitted by charged relativistic particles. A unexpected light propagation effect observed by the experiment is an anisotropic attenuation, which is aligned with the local flow direction of the ice. Birefringent light propagation has been examined as a possible explanation for this effect. The predictions of a first-principles birefringence model developed for this purpose, in particular curved light trajectories resulting from asymmetric diffusion, provide a qualitatively good match to the main features of the data. This in turn allows us to deduce ice crystal properties. Since the wavelength of the detected light is short compared to the crystal size, these crystal properties do not only include the crystal orientation fabric, but also the average crystal size and shape, as a function of depth. By adding small empirical corrections to this first-principles model, a quantitatively accurate description of the optical properties of the IceCube glacial ice is obtained. In this paper, we present the experimental signature of ice optical anisotropy observed in IceCube LED calibration data, the theory and parametrization of the birefringence effect, the fitting procedures of these parameterizations to experimental data as well as the inferred crystal properties.</p

DESY

Copernicus Publications